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© Method and apparatus for estimating motion vector fields by rejecting local outliers. 
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© An image analysis system generates a motion vector field from first and second images in a sequence of 
steps. First, random, initial motion vector values are assigned to each pixel in the image frame. Next, an 
objective function is defined for each motion vector and its corresponding motion vector in the other frame. The 
differences between the motion vectors of the target pixel value in the current frame and the other frame are 
minimized by minimizing the objective function values, first based on all neighboring pixel values. Next, the 
motion vector differences are further minimized by reducing the objective function values by rejecting motion 
vectors which correspond to the neighboring pixel values for which the value of the difference between the 
respective objective functions is outside of a threshold range. In one embodiment of the invention, pixel values in 
the first image frame are compared to neighboring pixel values in a second image frame which occurs after the 
first image and to other neighboring pixel values in a third image frame which occurs before the first image 
frame. This embodiment substantially eliminates motion vector errors due to both motion discontinuities and 
occlusion of pixels in the image. 
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FIELD OF THE INVENTION 

The present invention relates to the analysis of sequences of images, made up of picture elements 
(pixels), which exhibit motion and. in particular, to apparatus and a method for estimating motion vector 
s fields in a sequence of moving images. 

BACKGROUND OF THE INVENTION 

A motion vector field is a pixel-by-pixel map of image motion from one image frame to the next image 
io frame. Each pixel in the frame has a motion vector which defines a matching pixel in the next frame or in a 
previous frame. The combination of these motion vectors is the motion vector field. 

Although the techniques described herein could easily be applied to image components other than 
frames, such as image fields or portions of image frames, the description below refers only to image frames 
so as to avoid confusion in terminology with the fields of motion vectors. 
,s The estimation of motion vector fields is an important task in many areas of endeavor such as computer 
vision, motion compensated coding of moving images, image noise reduction and image frame-rate 
conversion The problem of estimating motion vector fields is inherently difficult to understand. This is 
because many different sets of motion vector fields can be used to describe a single image sequence. 

One simple approach is to assume that a block of pixels moves with the same kind of motion such as 
constant translation or an affine motion. This kind of block matching approach frequently fails to produce a 
good estimation of motion because it disregards the motion of pixels outside of the block. Thus, the motion 
model may be incorrect for describing the true motion of pixels within a block when the block s.ze is large 
and may be significantly affected by noise when the block size is small. 

Conventional approaches to the problem of estimating motion vector fields typically requ.re simulta- 
neously solving equations having several thousand unknown quantities. Numerous techniques, based on 
gradients correlation, spatiotemporal energy functions and feature matching have been proposed. These 
techniques have relied upon local image features such as the intensity of individual pixels and on more 
qlobal features such as edges and object boundaries. 

Recently two processes have been proposed which have successfully solved two problems in motion 
vector estimation: motion vector discontinuity and occlusion. The first of these is these processes is the 
"line process" described in a paper by J. Konrad et al entitled "Bayesian Estimation of Motion Vector 
Raids" IEEE Transac tions on Pattern Analysis and Machine Intelligence , vol. 14. pp 910-927 Sept. 199,*. 
The second process is the "occlusion process" described in a paper by R. Depommier et al entitled 
"Motion Estimation with Detection of Occlusion Areas". IEEE International Conference on Acoustics and 
Speech Signal Processing , pp. Ill 269-272. 1992. Although successful, these processes increase substan- 
tially the number of unknowns that need to be estimated and also introduce other parameters particular to 
the line and/or occlusion processes. » 

Global formulations over the complete motion field have been proposed to deal with this deficiency of 
the block matching techniques. One such formulation is proposed by B. Horn et al. in a paper ent.tled 
"Determining Optical Flow" Artificial Intelligence , vol. 17. pp 185-203. 1981. According to this proposal 
motion vectors are estimated by minimizing the error of the motion constraint equation and the error of 
motion smoothness over the entire image. In this formulation, the motion constraint equation is derived from 
the assumption that the image intensity is constant along the motion trajectory. Any departure fron. th.s 
assumed smooth motion is measured as the square of the magnitude of the gradient of motion vectors. 
While this approach improves the handling of general types of motion, such as elastic motion, it tends to 
blur the motion vector fields at places where the motion is not continuous (i.e. at motion boundanes). _ 
In a paper by E. Hilderith. entitled "Computations Underlying the Measurement of Visual Motion. 
Artificial i ntelligence , vol. 23 pp 309-354. 1984. a partial solution to the problem of handling motion 
boundaries is proposed. According to this proposal, the motion vector field is assumed to be smooth only 
along a contour but not across it. This proposal overcomes the blurring problem. Because, however, motion 
vectors at points not lying along contours cannot be obtained, this technique cannot propagate motion 
information across contours, such as those due to textures, which do not correspond to motion boundaries. 
These types of contours are common in real-world images. 

As described above, a technique which combines the line process along with Markov random field 
modeling and stochastic relaxation has been proposed by S. Genman et al. in a paper entitled Stochastic 
Relaxation. Gibbs Distributions, and the Bayesian Restoration of Images." IEEE Transactions on Pattern 
Analysis and Machine Intelligence , vol 6. pp 721-741. Nov. 1984, the described technique was used for 
restoring degraded images. In this context, a line process is a boolean field to mark the image intensity 
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boundaries. Other researchers have adapted this idea to overcome the blurring problem of an estimated 
motion vector Held by modifying the line process to indicate motion boundaries. An example of this 
technique is contained the above referenced paper by J. Konrad et al. One drawback of this method is that 
one additional unknown must be introduced for every two adjoining pixels in order to implement the line 
process. These additional unknowns greatly increase the computational overhead of any algorithm which 
employs this method. 

Occlusion, by definition, means that part of the image cannot find a matching part in another image 
which corresponds to the same part of the scene. That part of the image was occluded from one image 
frame to the next. Occlusion appears quite often in real-world images when, for example, one object moves 
in front of another object, an object moves toward the camera, or objects rotate. If only two frames are 
used, it is difficult to obtain a good estimate of motions with occlusion because, for at least some parts of 
one image, there is no corresponding image part in the other image. 

One simple solution to this problem is to use three image frames, a target frame and the frames 
occurring immediately before and immediately after the target frame. In most cases of real-world images, a 
matching portion for image parts in the middle frame can be found in either the preceding or succeeding 
frame. The above referenced paper by Depommier et al. proposes a combination of the line process, as set 
forth in the Konrad et al. paper with an occlusion process to detect occlusion areas using three frames. One 
drawback of this combination, however, is that it requires even more unknowns and parameters to produce 
the model than the line process alone. 

SUMMARY OF THE INVENTION 

The present invention is embodied in an image analysis system which generates a motion vector field 
from first and second images by: defining an objective function of the motion vectors in the field, locally 

25 comparing motion vectors associated with a pixel value and its neighboring pixel values in the first image to 
corresponding motion vectors associated with pixel values in the second image according to the objective 
function; minimizing the objective function based On only some of the neighboring motion vectors. The 
rejected motion vectors corresponding to the neighboring motion vectors having a difference that is outside 
of a threshold range. The rejected neighboring motion vectors are rejected based only on locally available 

30 motion vector values. 

Brief Description of the Drawings 

Figure 1 (prior art) is a drawing of an exemplary input image according to the present invention. 
35 Figure 2 (prior art) is a drawing of an image vector field for a motionless sequence of images according 

to Figure 1 . , ir 1 

Figure 3 (prior art) is a drawing of an image vector field for a sequence of images according to Figure ^ 

which include a moving component 

Figure 4 is a drawing of an estimated image vector field produced using a first embodiment of the 

40 invention for images according to Figure 1. 

Figure 5 is a drawing of an estimated image vector field produced using a second embodiment of the 
invention for first, second and third images according to Figure 1. 

Figure 6 is a block diagram of a parailel processor computer system suitable for implementing an 
embodiment of the present invention. 
45 Figure 7 is a flow-chart diagram which illustrates the generation of a motion vector field according to the 

present invention. . 

Figure 8 is a block diagram of a video encoding system which uses an embodiment of the invention to 

aid in motion adaptive encoding of video signals. 

Figure 9 is a block diagram of a robot vision system which uses an embodiment of the invention to aid 
so in directing a robot arm, based on video input data. 

DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS 

To understand the operation of the proposed invention, it is helpful to review the original technique for 
55 defining motion vector fields using smoothness assumptions in order to understand the nature of the 

When a camera moves relative to the objects being imaged, there are corresponding changes in the 
image. Disregarding, for the moment, the occlusion of areas and newly exposed areas, for every point of an 
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Image at time f, there exists a corresponding point in another image captured at a different time. We can 
connect every such pair of points by a respective straight line to yield a set of motion vectors and to define 
a displacement field (motion vector field) as the set of these vectors projected on the image plane. The 
purpose of motion vector field estimation is to estimate such a motion vector field from an observed image 
5 sequence. This motion vector field may then be used for various types of image processing that are useful 
in such fields as computer vision, the motion compensated coding of moving images, noise reduction and 
frame-rate conversion. 

The following is a mathematical derivation of an exemplary method, according to the present invention, 
for generating data values representing a motion vector field other data values representing individual 
ro picture elements (pixels) of two or more images. The described method is not, however a mathematical 
method. This derivation is only presented to show the robustness of this method. 

Let g be the true underlying time-varying image observed via an ideal camera, and g be the observed 
image acquired from a normal camera which relates to g through some transformations in the acquisition 
process such as filtering, gamma correction, quantization and distortion due to noise. 
75 The observed image g is sampled to produce a rectangular lattice A of pixel values with vertical, 
horizontal and temporal sampling periods (T ¥i T h , T t ). The total number of pixels in an image frame is 

denoted as N = N v x N h . Pixels are ordered as x, = [mT v , nT h ], where i = mN h +n, m = 0 N v -1 and n 

= o N h -1. Let g k and g(x,k) be the k-th image frame and the intensity at spatial position x of the k-th 

frame, respectively. Without loss of generality, we focus on the problem of estimating the motion vector 
20 field from frames g k and g k+ i. Let d, and d, be the true and estimate of such motion vector field, and d,(x,) 
be the estimated motion vector at pixel x, let n c (x,) be the set of neighbors of pixel x, for a neighborhood 
system of size c (i.e. N c (x f ) = {xjiCXllxj-XjlPSc}) and |N c (Xj)|denotes the number of elements in N c (x,) . 
Although it is contemplated that any neighborhood size may be used, for convenience in notation, the 
exemplary embodiments of the invention described below assume a system which includes a central pixel 
and its eight immediate neighbors (i.e.N 2 (Xi)). An example of this neighborhood is shown in Figure 1 with the 
reference number 110. In Figure 1. each rectangular box represents a respective pixel position in the 
image. The central pixel in this neighborhood is identified by the reference number 112. 

In order to obtain a motion vector field for the image shown in Figure 1 and other images (not shown) in 
its motion sequence, it is desirable to specify a structural model relating motion vectors and image intensity 
values and to make some assumptions about the underlying true motion. It is common, for example, to 
assume that image intensity along motion trajectories does not change. This assumption is quantified in 
equation (1). 



«*w -«*♦,<*+*<*» (1) 

In addition, it is useful to assume that motion vectors vary smoothly in small neighborhoods. From 
40 these two assumptions, one can estimate motion in an image by minimizing the following energy function 
using the globally smooth (GS) motion model. U GS , as defined in equation (2). 

N- 1 

i = o 

+ V X P(<(*/M/ (*/)). 

so ( x p x j)<*r* neighbors 

where the displaced pixel difference square (PDS). a(x,,d,(x } ),l), is defined by equation (3) 
55 a(x h d(xi). f) = [g**j(Xi+cta)taMF (3) 

and the motion vector difference square (MDS), tf(d t (x,).d,(xj)) by equation (4). 
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MMMxi)) = llc«Xi)-<«*j)ll 2 W 

The parameter Xo is a weighting factor which represents the importance of motion information relative 
to image intensity and the smoothness assumption. 

To reconcile the smoothness assumption with the rejection of local outliers, we can rewnte equat.on (2) 



5 

as equation (5). 



is 



N - 1 

; = o (5) 



Where 

20 

X = |N(X,)|Xo/2. 

It is apparent from the above that, at the motion boundary, the outliers are the MDS ^drfx^drfx,)) when 
x, belongs to the other side of the motion boundary with respect to x,. These outliers affect the .estimate o 
25 dfx,) through the average of MDS /Jfdrfxd.dM) and propagate the error to other pixels through the overall 
energy function. If these outliers can be rejected before they enter into the average, they w.ll not contribute 
to the blurring of the estimated motion vector field. A method of rejecting outliers based on a threshold has 
been found to produce good results. According to this method, the energy function of equat.on (5) .s 
modified as shown in equation (6). 
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N- 1 

; = o 

^tAtx X P (',<*«>. *i<*/»ty 

Where the indicating function 5, is defined by equation (7) below. 

0 else 

and the total number of accepted neighbors is defined by equation (8). 



(6) 



(7) 



XjB K(x ; ) J 

55 

In equation (7), I is the r-th rank of the ordered MDS of {/3(d.(x i ),d.(x J )). x,e N(x,)}. that is to say 



(8) 
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00 < 01 $ .02 ' " ^ P »(Xi)-l ; (9) 

and T M £ 1 is the threshold constant. 

According to equation (6). outliers will be rejected from the average value of MOS if they have values 
greater than T or times the reference I. Since this threshold is derived from the MDS in m(x,). the operat.cn 
of rejecting outliers is not sensitive to the amplitude of motion vectors. Thus, a system which operates 
according to the equation (6) can distinguish outliers locally. In addition, in smooth areas of the image all 
neiqhbors are accepted since the MDS of all neighbors in these areas is similar by definrtion. 

The selection of the threshold reference jS, involves two conflicting factors: the largest number o 
outliers that may be rejected and the number of neighbors needed to propagate the assumption of 
smoothness. If the highest rank is selected then no outliers are rejected. If the lowest rank is selected, all 
neiahbors except one are rejected as outliers and the smoothness assumption can propagate in only one 
direction The inventor has determined that good performance is obtained for images which contain moving 
rectangular objects when h is used as the reference, since it allows five outliers surrounding a moving 

^lUs^ntSated that other measures may be used to exclude outliers based on local measurements. 
For example, the threshold T 0 , may be set to the median value of all of the samples in the neighborhood. 
The use of this threshed value has an advantage since it decreases the number of computations needed to 

orocess each pixel in the image. ... -.u 

The method described above is applicable for generating a motion vector f.eld for one image with 
reference to another adjacent field. As described above, however, this may not be sufficient to handle the 
occlusion problem since two other images, one before and one after the target image, are used to generate 
25 the motion vector field for the target image. . ^ 

The method chooses between the best forward motion and the best backward motion as determined 

from the following frame 
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^FoTthe globaSy^ooth motion model, the energy function, as described by equation (10) below, can 
be derived from equation (5), described above. 

N - 1 

U GSFB > = X *** {»tiZ( X i> d l( X i)> 1 ') 



GSFB f) 

■V l 2 (10) 



50 



55 



I" W|x.6K(x.) 



where the factors w, are weights which determine the relative importance of motion information from the 

different frames. 

To reduce computation, it is assi 
This is represented by equation (11). 



^"Tre^ucecomputation. it is assumed that motion vector fields at each pixel follow constant translation. 
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The energy function using the globally smooth motion model with forward and backward motion may be 
defined using equation (12). 



N-l 



10 



75 



VgspbVi) = X 111111 {w / {a(x. f rf / (x I .),0 



' = o / = /„/■ 



l'*2 



+ X,. 



X pcrf/CXf),^^))}} 



(12) 



2 o Similarly, the energy function which uses local outliers for forward and backward motion may be 
defined using equation (13). 



N-l 
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(13) 



u lofb(*0 = X ^ {w/{a(*i»*/(*;>>0 

1 r.€ K(x.) 



35 The estimate of motion vector fields can be determined by minimizing either equation (12) or equation 
(13) using three given frames as constrained by equation (11). In this operation, when h = -It. it .s helpful 
to assume the following: 



W ll = W l2 and 



(15) 
(16) 



45 Many methods may be used to minimize the energy functions in equations (5). (6). (12) and (13). The 
inventor has selected a technique known as simulated annealing as described in the above-referenced 

Daper by Genman et al. , . „„. KI „ 

The following is a summary description of this algorithm. Assume S d represents the set of possible 
motion vector values having accuracy A and total levels 2N d + 1 in each direction. This is stated in equat.cn 
so (17). 
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, r,;=0, ±1, ...,±N d } (I?) 
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Assume I = {0 N - 1} and I, = I - {i}. It is desired to estimate an unknown motion vector field d, 
with energy function U(d ( ). The motion vector field d, may be modeled as a random field with the Gibbs 
distribution, as shown in equation (18). 



(18) 



to In this equation the variable T is a simulated temperature tor annealing and Z is a normalizing parameter 
such that EP(d,) = 1 Using Bayes rule and the law of total probability, the probability of a motion vector at 
a current pixel x,. given motion vectors of pixels other than pixel x, may be expressed as shown in 
equations (19) and (20). 



20 



P (d { (;c ; ) = d x (x : ) | d, (x q ) = d t (x q ) , q e / ; ) 

P{d,(X q ) =^/(x ? ) >g 6/) 



(19) 



X P {d t (x : ) = z, d l (x q ) =d t {x q ),qe I.) 
z<=S d 

- U ( d ,( x q )=dl(* q ),qel)/T 



(20) 



^ -U ( J, (x.) = z, d t ( Xq ) =J,(x q ), q e /.) /T 



30 



zeS d 



If To is assumed to be the initial simulated temperature. T f is assumed to be the final emulated 
temperature, and d w is an arbitrary initial estimate of the motion vector field, then the algorithm of simulated 
35 annealing can be used to minimize the energy functions of equations (5), (6). (13) and (14) using the 
following algorithm. 



Set temperature to T 0 
While temperature is greater than Tf 
For i * 0, . . • N-l 
. Replace di(xi) in the chosen energy 

function by the random sample generated 
from the conditional probability in 
equation (20) 
Decrease temperature following some 
annealing schedule 



50 



The inventor has determined that the annealing schedule should start from a very high simulated 
temperature (e.g. To = 500) and end at a low temperature (e.g. T f = 0.1). In addition, the inventor has 
determined that an annealing schedule which produces good results defines the simulated temperature T k 
for the k-th iteration by the exponential function of equation (21). 



55 T k = To a*' 1 (21) 



where "a" is a constant slightly less than 1.0. A two-dimensional random sample is generated from the 
bivariate discrete probability distribution in equation (20) by first generating the vertical component from the 
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one-dimensional marginal cumulative distribution obtained by accumulating the two-dimensional distribution 
horizontally. The horizontal component can then be generated from the one-dimensional cumulative 
distribution obtained from the two-dimensional distribution given by the generated vertical component. 

For the energy functions in equations (5), (6), (12) and (13), the corresponding conditional probability of 
equation (20) can be reduced to relate to only some local energy functions. For example, for the energy 
function of the globally smooth motion models, the local energy function of the current pixel, x,. as given by 
equation (22) 

VgsWW = z -^V -M*J.x q e V(x ; )) (22) 



where 
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K + (x ( .) = K(x.) u {x ; } and 

V(X ; ) = U {^€«(X 9 )}-{x ; } 



The first term in equation (19) can be pre-computed before the simulated annealing starts. The second 
term only depends on the motion vector of the surrounding pixels in N(xi). Accordingly, the new motion 
30 vectors for all pixels in the image may be calculated simultaneously. Thus, the minimization problem may 
be solved on a pixel-by-pixel basis using a highly parallel processor such as the processor shown in Figure 
6 described below. An exemplary processor suitable for use in generating motion vector fields according to 
the subject invention is the Princeton Engine, as described in a paper by D. Chin et al. entitled "The 
Princeton Engine: A Real-Time Video System Simulator" IEEE Transactions on Consumer Electronics , May. 

35 1988. pp 285-297. . ... 

It is noted that, while equation (22) concerns the globally smooth motion model, it may be readily 
modified in the same manner as between equations (5) and (6) and equations (12) and (13) to produce an 
equivalent energy function which uses local outlier rejection. In addition, for the globally smooth motion 
model, equation (22) may be further reduced to produce the energy function defined by equation (23). 



VCS ( d l = Z > d l < X i> = A ' X q £ K { X i> ) 

2X (23) 
= a (*., Z, I) + r— -ry J (3 (Z; d f (X-) ) 



Reference is now made to Figures 1-5 to describe the operation of the process. In Figure 1. a square 
so central portion 114 is defined in an image 100 frame. In this image, each small rectangle corresponds to a 
respectively different pixel value. The exemplary image data used by the apparatus and methods descnbed 
has pixels with random values. The image is shown having contrasting central and surrounding parts for 
clarity in the description. 

If there is no motion from this frame to the next frame, a motion vector field such as that shown in 
55 Figure 2 is generated. In this motion vector field, all vector elements are zero, indicating no motion in the 
image. 

If however the central area 114 moves to the position 116, as indicated by the broken-line box, 
between the current frame and the next frame then the motion vector for each pixel in the area should 
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indicate that the pixel has moved in the direction of the motion. • 

Using prior art techniques, however, which assume globally smooth motion, a motion vector field such 
as that shown in Rgure 3 is generated. In this field, a central area 114' corresponds to the area 114 in the 
frame image shown in Rgure 1. It is noted that the motion vectors in the center of the area 114' are correct 

s but that the vectors at the motion boundaries, such as that shown in the area 31 0. are incorrect. The motion 
vectors above and to the left of the area 114' are erroneous due to occlusion. The motion vectors below and 
to the right of the area 114' are erroneous due to motion discontinuities. 

If this data were applied to video encoding apparatus, erroneous motion vectors may be generated, 
increasing the amount of data needed to represent the encoded image frame. 

to When a method according to the present invention is used to generate the motion vector field from two 
frames one containing the area 114 and the other containing the area 116. a motion vector field such as 
that shown in Figure 4 is generated. It is noted that errors related to the motion discontinuities, such as the 
errors in the area 410, have been eliminated but errors related to occlusion, such as in the area 310, 

As described above, errors related to occlusion may be eliminated by using a method according to the 
present invention in which two other frames, one before and one after the current frame are used with the 
current frame to generate the data values representing the motion vector field. An exemplary motion vector 
field generated by this method is shown in Rgure 5. It is noted that there are no significant errors in any of 
the motion vectors which make up the motion vector field. 

As described above Rgure 6 is a block diagram of a highly parallel processor system wh.ch may be 
used to generate motion vector fields in accordance with the present invention. This processor system 
includes Nh times Nv processors P 0 .o through P Nv -,. Nh -,. Thus, the processor system has one processor for 
each pixel in the current image. It is contemplated that other parallel architectures may be used having 
fewer than one processor per pixel or that the process described below may be implemented on a single 
25 processor by serially processing each pixel in the current frame. 

As shown in Rgure 6. input samples corresponding to a current field, F„, a previous field F k ., and a 
next field F k+1 are applied to an input/output (I/O) processor 610. This processor stores the samples into a 
multi-port memory 614 under control of a control processor 612. The memory 614 may be, for example, a 
distributed memory having a respectively different portion co-resident with each of the plurality of 
oo processors, P„o through P Nv -,. Nh -,. The controller 614 and the processors P fl , 0 through P Nv -,. Nh -, operate 
according to the method described below with reference to Rgure 7 to produce the samples representing 
the motion vector field from samples representing two or three image fields. 

Although the I/O processor 610 is shown as receiving three frames of data, it is contemplated that, in 
steady state operation, only one new frame of data will be applied to the processor 610 at any given time, 
os Two of the previously stored frames will simply be re-designated such that the stored data correspond.ng to 
the frame F k will become data representing frame F k+ , and the stored data corresponding to the frame F k -, 
will become data representing frame F k . a~~***m 
It is also contemplated that the two-frame method of generating the motion vector field, descnbed 
above may also be implemented using the processor system shown in Figure 6. In this instance, only two 
40 frames F k and F k+1 are stored in the memory 614 and used by the processors P 0 .o through Pnv-i^-,. 

Rg'ure 7 is a flow-chart diagram which illustrates the overall operation of an exemplary embodiment of 
the invention. In the first step in this process, step 710. the source images are stored into the memory 614 
by the I/O processor 610. As set forth above, the memory 614 may be a monolithic multi-port memory or . 
may be distributed among the processors P 0 .„ through P Nv -,.Nh-.. Accordingly, the act of stonng the p.xe 
45 values into the memory also assigns the values to the respective processors. In the exemplary embodiment 
of the invention, each processor only needs access to its target pixel value and neighboring pixel values 
from both the current frame and the other frame or frames being input to the system. 

Next at step 712 each of the individual processors generates a random motion vector for its pixel. This 
may be done, for example by using a pseudo-random number generator for which each processor uses a 

so respectively different seed value. 

At step 714 one of the globally smooth energy functions is defined for each pixel in the frame. These 
functions are defined by equations (5) and (12) above. At step 716. the initial and final simulated 
temperatures are set. As described above, exemplary values for these temperatures are 500 and 0.1 
respectively At step 720. if the current simulated temperature value is greater than the final simulated 
ss temperature value, control is passed to step 722. Otherwise, the process is complete at step 732. 

Step 722 compares the current simulated temperature value to a switching temperature value. T, If the 
current temperature is less than T $ . then the energy function is switched from one which assumes * globally 
smooth image to one which rejects local outliers (i.e. from equation (5) to equation (6) or from equation (12) 
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m\\ The aloballv smooth energy function is used to reduce the computation overhead as 
£3E*2t ^e- U VoTces errorfat moving edges in the image. The energy function wh.ch 
reiects outliers is then used to correct motion vectors at motion contours in the .mage. 

iichever energy function is seiected. each of the processors P,.o through P Nv -,. Nh - t generates a new 
J S !TE in accordance with equation (22). Alternatively, a new random motion vector may be 
ZZ«T^Z T™Z:L ?not shown), similar to equation (22), but modified in = 
minells equation (6) is modified from equation (5). to omit outliers from the generat.cn of the mo«,on 

^ext at step 728, a new value for the energy function is generated based on the new motion vector. At 
step m me simulated temperature is reduced and control is transferred to step ,720. 

When the operation is complete, each of the processors P 0 .o through P Nv -,. Nh -, omams a value 
reorrsentinq the motion vector for its corresponding pixel. These motion vectors are read from the memory 
614 Z a motion vector field by the control processor 612 and are applied to circuitry which uses the motion 

"TisTonC^d':; the energy funcuon which reiects local outliers in T™£~£ 
value may be used as the only energy function in the process shown ,n Figure 7. This may 

Two exemplary systems which use motion vector field data are shown ,n figures 8 and 9. Rgure 8 h .a 
MODuS^SreTEM which i, h«b» incorpoofd by ..l.rence f» its teach,™,* „ ^on-ad.p... 

mo thnH bribed bv the flowchart diagram in Rgure 7 using equations (5), (6), (12), (13) and W in wn.cn 
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"^Motion vector fields may be used in a system of this type to determine the relative motion of portions of 
the image corresponding to the robot arm 912 and other portions of the image corresponding to workp.eces 
which are to be manipulated by the arm 912. 

Although the invention has been described in terms of exemplary embodiments, it is contemplated that 
it may be practised as outlined above within the spirit and scope of the claims. 



Claims 
10 1 



Apparatus for generating sample values representing a motion vector field which descr.bes mohon of 
individual image components of a current image frame and corresponding image components of at 
least one other image frame in a sequence of image frames, the apparatus comprising: 

initialization means for generating a plurality of initial motion vector values, each corresponding to 
an estimate of motion between one of the image components in the current image frame and a 
rs corresponding image component in the one other image frame; 

means for generating an energy function value for a target motion vector which corresponds to a 
current image component in the current frame, said energy function being a function of the target 
motion vector and ones of the neighboring motion vectors which correspond to ones of the .mage 
components that surround the current image component in the current image frame, where.n a 
20 neighboring motion vector is ignored as being an outlier motion vector if the neighboring mot.on vector 
differs from the target motion vector by more than a predetermined threshold; and 

means for minimizing the energy function of the current image by modifying the respective motion 
vector values to produce the sample values representing the motion vector field. 

25 2. Apparatus according to claim 1. wherein each image component is an individual picture element (pixel) 
of the image. 

3. Apparatus according to claim 1, wherein each image component includes a plurality of individual 
picture elements (pixels). 

Apparatus according to claim 1. wherein the means for generating an energy function value for the 
target motion vector includes means for rejecting the outlier motion vector as a funct.on of only the 
neighboring motion vectors which are adjacent to the target motion vector in the motion vector field. 

Apparatus according to claim 4. wherein the means for generating the energy function value includes: 

means for comparing each of N e neighboring motion vectors which surround the target motion 
vector where N c is a positive integer, to identify a predetermined number of the N e neighbonng motion 
vectors which have larger magnitudes than any other ones of the N e neighboring motion vectors; and 
means for rejecting the identified neighboring motion vectors as the outlier motion vectors. 

6 Apparatus according to claim 4. wherein the means for generating the energy function value includes: 
means for evaluating each of N c neighboring motion vectors which surround the target motion 
vector, where N c is a positive integer, to assign a magnitude value to each of the N 0 ne.ghbonng motion 

<s VeCt means for identifying one of the assigned magnitude values as being a median magnitude value for 
the N n neighboring motion vectors; and 

means for rejecting the identified neighboring motion vectors which have magnitude values greater 
than the median magnitude value as being the outlier motion vectors. 

so 7 Apparatus according to claim 4. wherein the means for minimizing the respective energy function 
values is an iterative process which includes means for switching the energy function used by the 
means for generating an energy function value, at a predetermined instant in the iterative process, from 
a first energy function, which uses all neighboring motion vectors, to a second energy function, which 
rejects outlier motion vectors. 

55 8. Apparatus according to claim 1, wherein the motion vector field describes mot.on of individual image 
components of the current image frame and corresponding image components in first and second 
image frame which differ from the current frame, wherein: 
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the initialization means includes means for generating a first plurality of initial motion vector values, 
each corresponding to an estimate of motion between one of the image components in the current 
image frame and a corresponding image component in the first image frame, and for generating a 
second plurality of initial motion vector values, each corresponding to an estimate of motion between 
s one of the image components in the current image frame and a corresponding image component in the 
second image frame; and 

the means for generating an energy function value for the target motion vector includes: 
means for generating a first energy function value for the target motion vector, the first energy 
function being a function of the target motion vector and ones of the first plurality of neighboring motion 
10 vectors; 

means for generating a second energy function value for the target motion vector, the second 
energy function being a function of the target motion vector and ones of the second plurality of 
neighboring motion vectors; and 

means for selecting, as the generated energy function value, one of the first and second energy 
function values based on differences in magnitude between the first and second energy function values. 

9. Apparatus according to claim 8, wherein the means for generating an energy function value for the 
target motion vector includes: 

means for generating a third energy function value for the target motion vector, the third energy 
function being a function of the target motion vector and ones of the first and second plurality of 
neighboring motion vectors; and 

means for selecting, as the generated energy function vaJue, one of the first, second and third 
energy function values based on differences in magnitude among the first, second and third energy 
function values. 

10. Apparatus according to claim 1, wherein the means for minimizing the energy function in the current 
" image includes means for processing the energy function using a simulated annealing algorithm. 

11 A method of generating sample values representing a motion vector field which describes motion of 
individual image components of a current image frame and corresponding image components of at 
least one other image frame in a sequence of image frames, the method comprising the steps of: 

a) generating a plurality of initial motion vector values, each corresponding to a respectively different 
one of the image components in the current frame; 

b) generating an energy function value for a target motion vector which correspond to a current 
image component in the current image frame, said energy function being a function of the target 
motion vector and ones of the neighboring motion vectors which correspond to image components 
that surround the current image component in the current image frame, wherein a neighboring 
motion vector is ignored as being an outlier motion vector if the neighboring motion vector differs 
from the target motion vector by more than a predetermined threshold; and 

c) minimizing the energy function of the current image by modifying the respective motion vector 
values to produce the sample values representing the motion vector field. 

12. A method according to claim 11, wherein each image component is an individual picture element (pixel) 
of the image. 

13. A method according to claim 11. wherein each image component includes a plurality of individual 
picture elements (pixels). 

14 A method according to claim 11, wherein step b) includes the step of rejecting the outlier motion vector 
based only on the neighboring motion vectors which are adjacent to the target motion vector in the 
motion vector field. 

15. Apparatus according to claim 14, wherein step d) is an iterative process which includes the step of 
switching the energy function used by step b) at a predetermined instant in the iterative process, from 
a first energy function, which uses all neighboring motion vectors, to a second energy function, which 
rejects outlier motion vectors. 
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16. A method according to claim 11. wherein the motion vector field describes ™«° nor " d ^?>™^ 
components of the current image frame and corresponding image components ,n first and second 
imaqe frames which differ from the current image frame, wherein: 

step a) includes the steps of generating a first plurality of imtial motion vector values, each 
corresponding to an estimate of motion between one of the pixels in the current image frame and a 
corresponding pixel in the first image frame, and generating a second p.ura.rty of i«M motion 
vector values, each corresponding to an estimate of motion between one of the .mage components 
in the current image frame and a corresponding image component in the second image frame; and 

^gl^T^e^gy function value for the target motion vector, the first energy function 
being a function of the target motion vector and ones of the first plurality of ne ig hbonng motion 

^generating a second energy function value for the target motion vector the .second I energy 
function being a function of the target motion vector and ones of the second plurality of neighbonng 

' 5 m0t 'selSras n tJe generated energy function value, one of the first and second energy function 

values based on differences in magnitude between the first and second energy function values. 

17 A method according to claim 16. wherein the step of generating an energy function value for the target 

20 ^^S^tZ^f^on va,ue for the target motion vector, the frird en ergy function 
being a function of the target motion vector and ones of the first and second plural.ty of neighbonng 

mMi se n .e V ctina S as a me generated energy function value, one of the first, second and third energy function 
values based on differences in magnitude among the first, second and third energy function values. 

18. A method according to claim 11. wherein step c) includes the step of processing the energy function 
using a simulated annealing algorithm. 
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(54) Method and apparatus for estimating motion vector fields by rejecting local outliers 



(57) An image analysis system generates a motion 
vector field from first and second images in a sequence 
of steps. First, random, initial motion vector values are 
assigned to each pixel in the image frame. Next, an 
objective function is defined for each motion vector and 
its corresponding motion vector in the other frame. The 
differences between the motion vectors of the target 
pixel value in the current frame and the other frame are 
minimized by minimizing the objective function values, 
first based on all neighboring pixel values. Next, the 
motion vector differences are further minimized by 
reducing the objective function values by rejecting 
motion vectors which correspond to the neighboring 
pixel values for which the value of the difference 
between the respective objective functions is outside of 
a threshold range. In one embodiment of the invention, 
pixel values in the first image frame are compared to 
neighboring pixel values in a second image frame which 
occurs after the first image and to other neighboring 
pixel values in a third image frame which occurs before 
the first image frame. This embodiment substantially 
eliminates motion vector errors due to both motion dis- 
continuities and occlusion of pixels in the image. 
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